Comparing Recurring Lexico-Syntactic Trees (RLTs) and Ngram Techniques for Extended Phraseology Extraction
نویسندگان
چکیده
This paper aims at assessing to what extent a syntax-based method (Recurring Lexicosyntactic Trees (RLT) extraction) allows us to extract large phraseological units such as prefabricated routines, e.g. as previously said or as far as we/I know in scientific writing. In order to evaluate this method, we compare it to the classical ngram extraction technique, on a subset of recurring segments including speech verbs in a French corpus of scientific writing. Results show that the RLT extraction technique is far more accurate for extended MWEs such as routines or collocations but performs more poorly for surface phenomena such as syntactic constructions or fully frozen expressions.
منابع مشابه
Using Lexico-Syntactic Ontology Design Patterns for Ontology Creation and Population
In this paper we discuss the use of information extraction techniques involving lexico-syntactic patterns to generate ontological information from unstructured text and either create a new ontology from scratch or augment an existing ontology with new entities. We refine the patterns using a term extraction tool and some semantic restrictions derived from WordNet and VerbNet, in order to preven...
متن کاملFreepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction
The increasing availability and maturity of both scalable computing architectures and deep syntactic parsers is opening up new possibilities for Relation Extraction (RE) on large corpora of natural language text. In this paper, we present FREEPAL, a resource designed to assist with the creation of relation extractors for more than 5,000 relations defined in the FREEBASE knowledge base (KB). The...
متن کاملRecognition of Structured Collocations in An Inflective Language
We present a method of the structural collocations extraction for an inflective language (Polish) based on the process divided into two phases: extraction and filtering of the pairs of wordforms reduced to baseforms and structural annotation of the extracted collocations with lexico-syntactic patterns. The parameters of the patterns are specified manually but their instances are generated and t...
متن کاملExtraction of Semantic Relationships from Academic Papers using Syntactic Patterns
Integrating concept and citation networks on a specific research subject can help researchers focus their own work or use methods described in prior works. In this paper, we propose a method to extract semantic relations from concepts and citation in the descriptions of related work. Specifically, we examined (i) topic-paper relations between research topics and reference papers and (ii) method...
متن کاملCombining Statistical Techniques and Lexico-syntactic Patterns for Semantic Relations Extraction from Text
We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “syntagmatic” relations. On the other hand, a statistical unsupervised association system is used to obtain a ...
متن کامل